Chinese sign language recognition based on surface electromyography and motion information

Sign language (SL) has strong structural features. Various gestures and the complex trajectories of hand movements bring challenges to sign language recognition (SLR). Based on the inherent correlation between gesture and trajectory of SL action, SLR is organically divided into gesture-based recognition and gesture-related movement trajectory recognition. One hundred and twenty commonly used Chinese SL words involving 9 gestures and 8 movement trajectories, are selected as research and test objects. The method based on the amplitude state of surface electromyography (sEMG) signal and acceleration signal is used for vocabulary segmentation. The multi-sensor decision fusion method of coupled hidden Markov model is used to complete the recognition of SL vocabulary, and the average recognition rate is 90.41%. Experiments show that the method of sEMG signal and motion information fusion has good practicability in SLR.


Introduction
Sign language (SL) is the main way for deaf/mute individuals to communicate, which enables them to improve their social participation.Sign language recognition (SLR) uses computers to convert the information expressed by SL actions into specific target application information, which has become one of the research hotspots in the field of rehabilitation medicine [1,2].
Traditional SLR technologies based on images and data gloves do not easily meet the requirements of wearability and low cost [3,4], and the recognition method based on the combination of surface electromyography (sEMG) and motion information is gradually favored [5][6][7].Approximately 5600 types of Chinese sign language (CSL) vocabulary exist.At present, most research results only aim at the preset test words and lack the universality of the entire vocabulary.Therefore, it is essential to put forward systematic solutions for all CSL vocabulary recognition.
Based on the structural characteristics of CSL, many scholars decompose it into pure structural elements for analysis and research, such as hand shape, orientation, posture, and position.Yang et al. [8] used the hand shape, orientation, position and other elements of gesture action to classify the vocabulary step by step.Although this method has high recognition rate and accuracy, it has the disadvantages of a small number of recognized words and lack of systematization.Due to the large number of gesture movements in CSL, Tigrini et al. [9] have shown that placing the collection device on the forearm or wrist is helpful in recognizing complex gesture movements.As the CSL action process has the complex characteristics of spatio-temporal information change, many scholars use multi-sensor information fusion technology to improve the accuracy of SLR.Yang et al. [10] fused image, sEMG and acceleration (ACC) sensor information to achieve high recognition rate, but the system design is extremely complex.Tian et al. [11] used the data fusion of sEMG and ACC sensors and introduced statistical language model to recognize SL, with a recognition rate of 90%.However, these studies directly input ACC eigenvalues and sEMG into the classifier, and only take ACC as the auxiliary feature of gesture state, without considering the internal correlation and dependence between gesture and movement trajectory, nor considering the spatiotemporal attributes of gesture and trajectory in the formation process of CSL.This fusion method has great limitations in the vocabulary expansion of SLR.At present, there are thousands of words are included in CSL.
Identifying them one by one will not only produce a great burden of training and calculation but also complicate the recognition system.
Coupled hidden Markov model (cHMM) is a multi-stream Markov chain that describes the interaction of multiple random processes.It is highly suitable for the interactive fusion of multiple independent information streams [12].CSL is a strong spatio-temporal action correlation process regarding gestures and trajectories.The strong timing coupling analysis ability of cHMM can effectively analyze the internal characteristics of gestures and trajectories before and after CSL.cHMM also has the advantage of analyzing the correlation and asynchronous characteristics of various source data streams [13].When analyzing and processing the dual information flow of gesture based on sEMG and trajectory based on motion information, cHMM can not only ensure the independence of the implementation of the dual information flow algorithm, but also considers the correlation characteristics of gesture and movement trajectory at a certain SL action time.It is very suitable for the fusion of gesture and movement trajectory in SLR.On the basis of summarizing and analyzing the internal characteristics of CSL formation, this study aims to organically decompose CSL into standardized gestures and gesture-related movement trajectories, fully analyze the mode features and motion features of sEMG and motion information, output the gesture and movement trajectories related to CSL, and then integrate the two with cHMM.A systematic SLR method with more universal applicability and a richer vocabulary is proposed.
This study proposes a general method to decompose Chinese sign language into 37 standardized gestures and 18 action trajectories, and uses the sEMG, ACC and AV signals of motion information to comprehensively study the recognition of sign language words.Successfully applied to 120 common vocabulary words.This study provides a relevant foundation for the development of high real-time, high reliability and wearable sign language recognition devices.

Participants
We enrolled 8 participants (7 men and 1 women, age: 22.1 ± 1.1 years (22-25 years)).All participants provided written informed consent, and the experimental procedures were approved by the local ethics committee of Hangzhou Dianzi University.

Experimental preparation
A total of 8 healthy volunteers (7 males and 1 female) aged 20-30 were recruited (All subjects were informed of the specific experimental procedures and potential risks, and signed informed consent forms).As shown in (Fig 1), each volunteer sat on a chair and performed 120 CSL vocabulary actions respectively.The experimental acquisition system in (Fig 2) was used to record the corresponding sEMG, ACC, and AV signals.Based on these signals, 960 groups of data were collected, of which 480 groups were training samples and the other 480 groups were test samples.In the experiment, each state corresponded to a CSL vocabulary, and the likelihood probability value of the corresponding state output of each model was calculated.The composite state with the largest probability value was the target vocabulary.
The number of channels for sEMG signal acquisition and analysis influences the recognition performance, complexity and calculation of the recognition system.In general, the number of channels should be reduced as much as possible on the premise of good recognition rate, to reduce the system complexity and calculation.Therefore, according to the correlation between gestures and muscle groups, this study selects four muscle groups as signal acquisition objects: extensor carpi radialis (ECR), extensor digitorum (ED), flexor digitorum superficialis (FDS) and extensor pollicis brevis (EPB).The layout position of the four channel sEMG sensor is shown in (Fig 3).
Trigno wireless sEMG acquisition is used to build an experimental acquisition system based on sEMG and motion information.Each trigno sensor has built-in three-axis accelerometer, three-axis gyroscope and sEMG acquisition module, that can collect ACC, angular velocity (AV) signals and sEMG signals of the corresponding muscle groups in real time.The sampling frequency of the sensor is 1000Hz.The duration of a SL action is approximately 2s.During the experiment, the sensor recording sEMG signal was pasted on the surface of the corresponding muscle group of the forearm of the experimental object.The sensor recording ACC and AV signals is pasted near the wrist joint to facilitate more accurate detection of the spatial position and movement of the hand, as shown in (Fig 2).

Chinese sign language (CSL) decomposition
CSL is developed on the basis of finger letter gesture [14], which is the research basis of CSL gesture.At present, CSL has evolved into a complex dynamic mode accompanied by limb movement in the formation and change of various gestures.Therefore, the main research contents of CSL include various gestures and gesture-related movement trajectories.Therefore, this study proposes to decompose CSL into several standardized gestures and movement trajectories, organically divide a large number of CSL vocabulary recognition into gesture recognition and gesture-related movement trajectory recognition, and systematically propose an SLR scheme.Their recognition depends on sEMG and motion information of CSL action respectively.As sEMG and motion information signals have certain motion predictability [15], they are suitable for the recognition of various changing gestures and the tracking of movement trajectory.The organic combination of sEMG and motion information is also conducive to the study of the internal correlation and logic of CSL gesture and movement trajectory.
In Chinese, 30 finger letter gestures are used (Fig 4) [14,16], of which three letter gestures are the same, with only differences in direction.The actual gestures are 27.In addition, 10 kinds of gestures (Fig 5) have been added to "Chinese Sign Language (Revised Edition)" [14].Therefore, a total of 37 kinds of standardized CSL gestures are examined in this study.After repeated research and induction of all CSL movement trajectories, 18 kinds of regular trajectories are obtained, as shown in (Fig 6).Then, there are 19 kinds of movement states in addition to the state of action rest.According to the types of gestures, CSL is mainly divided into single hand gesture vocabulary (SHGV), double hand gesture vocabulary (DHGV) and dynamic gesture vocabulary (DGV).SHGV refers to the vocabulary expressed only by the action of the main hand (usually the right hand).DHGV refers to the vocabulary expressed by the main and auxiliary hands, and the gesture actions of these hands are the same and different.In DGV, the gesture actions in a vocabulary expression cycle are not fixed but changeable.Therefore, through the organic combination of 37 gestures and 18 movement trajectories, the number of words that can be recognized in theory is 52022, which can cover all CSL vocabulary.As the joint activities of more than 20 degrees of freedom of the hand are driven by specific muscle groups [17,18], and the muscles are interrelated and coordinated, all gestures can be recognized by appropriately increasing the number and position layout of sEMG sensors.Then, 18 movement trajectories can also be detected by various motion sensors.The CSL decomposition and recognition method based on standardized gesture and movement trajectory is shown in (Fig 7).First, CSL is decomposed into standardized gestures and movement trajectories.Then, gesture recognition and movement trajectory classification based on gesture formation process are carried out respectively.Finally, the recognized CSL target vocabulary is output through cHMM fusion algorithm.
The CSL vocabulary is extensive.The scheme shown in (Fig 6) is a systematic SLR solution that can systematically summarize the recognition of all CSL vocabulary words into the recognition of 37 gestures and 18 movement trajectories.However, as the CSL vocabulary involves many uncommon words, it is neither lengthy nor complicated, nor does it necessitate identification and analysis of all words.To facilitate the analysis, this study selects 120 words involving nine gestures and eight trajectories as an example to examine the problem of SLR based on gesture and movement trajectory decomposition.Specifically, the 120 target words are presented in Table 1.

Gesture recognition
By collecting the sEMG signals of specific muscle groups and analyzing the pattern information, nine corresponding gesture can be recognized from the sEMG signals of four muscle groups.The specific steps can be found in [19].The definitions of the nine gestures and the corresponding CSL vocabulary are shown in (Fig 8).These nine gestures are repeated more frequently in the CSL vocabulary, which is more convenient to intuitively explain the combination form of gestures and gesture-related movement trajectories.

Movement trajectory recognition
ACC and AV signals capture the movement trajectory information executed by SL, build the trajectory completely through the algorithm, and use the trajectory classification method to distinguish the eight movement trajectories.The specific steps can be found in [20].(Fig 9)

Vocabulary segmentation based on sEMG and ACC dual-signal amplitude
A complete SL vocabulary often involves several consecutive different gesture actions, which makes the gesture segmentation algorithm based entirely on sEMG signal amplitude unsuitable for the vocabulary segmentation.In [21], the amplitude change of the sEMG signal used as the judgment basis for the start and end points of gesture action.An sEMG signal can represent the level of muscle activity.When the gesture is switched from one action to another, the corresponding muscle will relax temporarily.Therefore, the amplitude change information of the sEMG signal can be used for data segmentation of SHGV and DHGV.However, for DGV with multiple gesture combinations, judging only the sEMG signal amplitude, such as the CSL vocabulary word "clear" is not enough.(Fig 10 ) shows the sEMG and ACC signal activity diagram in a vocabulary cycle.In the VCM(Vertical circular arc movement) trajectory, the process of changing from FFE(Five fingers extended) to ET(Extended thumb) gesture occurs.During the gesture-switching process, the muscles relax briefly.If only the change of sEMG amplitude is used as the basis for vocabulary segmentation, the word "clear" is easy to divide into two other independent words, resulting in false recognition.In fact, the dynamic gesture process is also accompanied by the violent fluctuation of the ACC signal.According to this information, combining the amplitude changes of sEMG and ACC signals can enable more effective judgment of the start and end points of CSL activities.In this study, the absolute mean sliding window (AMSW) method is used to detect the start and end points of CSL activity for the synchronization of sEMG and ACC signals.The specific steps are as follows: • The time series of the sEMG signal with channel k length N is expressed as x k (i), i = 1,2,. .., N. The absolute mean value of the signal sample is: • The time series of the ACC signal with channel l length M is expressed as y l (j),j = 1,2,. ..,M.
The absolute mean value of the signal sample is: • The length of the moving window is K and the step is T. When K/3�T�K/2,better experimental results and computational efficiency are obtained.In the experiment, The moving window K = 50 and the step T = 25.

Decision fusion of gesture and movement trajectory based on cHMM
cHMM can be regarded as a multi-chain HMM structure, and coupling conditional probability is introduced between the state sequences of each HMM [22], as shown in (Fig 12).The model consists of two HMM chains named HMMa and HMMb, which contains a hidden state sequence and an observed value sequence respectively.The number of hidden states can be set according to the actual application.The Figure shows that a certain state of each HMM chain at any time only depends on two different channel states at the previous time.This singlechannel asynchronous cHMM structure retains Markov characteristics, and SLR processes the combined information of gesture and movement trajectory.The gesture and movement trajectory data are independent two channel information streams, and their modal information is also a time-dependent sequence with Markov characteristics.Therefore, the integration of SL gesture and movement trajectory using cHHM is in accordance with its internal law.
The parameters of a cHMM with two chains are described as follows [12]: • Q: State sequence of the model.The two chains are gesture and movement trajectory sequence.Therefore, the state of the model at any time is the state combination of the two chains, which is CSL vocabulary.Note that the number of states of the c-th chain is N c , and the number of states of the model is Y 2 c¼1 N c , that is, the number of CSL words.Furthermore, the N c of the c-th chain is S c 1 ; S c 2 ; � � � ; S c N , the state of the c-th chain at time t is q c t , and the state of the model at time t is q t ¼ fq 1 t ; q 2 t g.Then the state sequence of the model is Q ¼ fq 1 ; q 2 ; � � � ; q T g ¼ fðq 1  1 ; q 2 1 Þ; ðq 1 2 ; q 2 2 Þ; � � � ; ðq 1 T ; q 2 T Þg. • O: Observed value sequence of the model.Similarly, the observation sequence of the model also includes the observation sequence of two chains.Note that the state of the c-th chain at time t is o t ¼ fo 1 t ; o 2 t g.Then the state sequence of the model is • π: Initial state probability vector, π = {π i }, where π i is the prior probability of S i ¼ fS 1 i1 ; S 2 i2 g at time t = i, that is, • A: State transition probability matrix, A = {a i,j },where a i,j is the probability of model transfer from S i ¼ fS 1 i1 ; S 2 i2 g to S j ¼ fS 1 j1 ; S 2 j2 g, that is, Where a c i;jc represents the probability that the c-th chain is in state S c jc at the current time given that the model is in state S i ¼ fS Similarly, cHMM can be abbreviated as λ = (π,A,B) is adjusted to maximize the probability of generating the observed value sequence.That is, a set of model parameters � l is found so that The preceding equation is a maximum likelihood estimation problem with hidden variable Q, and the expected maximum algorithm can be used to iteratively obtain the local optimal solution.According to [11,12], the estimated model parameter � l ¼ ð� p; � A; � BÞ is obtained and satisfies PðOj � lÞ � PðOjlÞ.That is, the estimation formula always increases the probability P (O|λ) until the local maximum is obtained.Taking the estimated model parameter � l ¼ ð� p; � A; � BÞ as the new initial model parameter, we repeat the iterative steps of the expected maximum algorithm until the probability PðOj � lÞ converges.The final model parameter is the maximum likelihood estimation of the model, that is, the obtained cHMM model.The combination state corresponding to the maximum output likelihood probability (i.e.CSL vocabulary) is the target object.

CSL decomposition status table
In the fusion experiment of the gesture and movement trajectory information of the CSL vocabulary using the cHMM method, first, two HMM chains and implicit states of cHMM structure were defined.Nine types of gesture recognition output were defined as hidden state ðq 1 1 ; q 1 2 ; q 1 3 ; q 1 4 ; q 1 5 ; q 1 6 ; q 1 7 ; q 1 8 ; q 1 9 Þ of HMMa chain.Rest state, and eight types of movement trajectory recognition output were defined as hidden state ðq 2 1 ; q 2 2 ; q 2 3 ; q 2 4 ; q 2 5 ; q 2 6 ; q 2 7 ; q 2 8 ; q 2 9 Þ of the HMMb chain.The cHMM composite hidden state are combination ðq 1  1 ; q 1 2 ; q 1 3 ; q 1 4 ; q 1 5 ; q 1 6 ; q 1 7 ; q 1 8 ; q 1 9 Þ and ðq 2 1 ; q 2 2 ; q 2 3 ; q 2 4 ; q 2 5 ; q 2 6 ; q 2 7 ; q 2 8 ; q 2 9 Þ, up to 81 types in theory.A total of 120 CSL words are in the test vocabulary, including 45 words of SHGV, 52 words of DHGV and 23 words of DGV.Three decomposition state tables are established for the three types of vocabulary.Each vocabulary word in the table is represented as a composite state of a combination of standardized gestures and movement trajectories.After the three types of vocabulary are mapped to the state table, the recognition of the CSL vocabulary is calculated as the likelihood probability value of the cHMM state output.Due to space limitations, this study includes only the representative vocabulary in the three status tables, which are explained together (Fig 13).

Decision fusion experiment based on cHMM
In the decision fusion experiment, first, the signal activity segment of a complete CSL vocabulary is intercepted by the vocabulary segmentation method.In the signal activity segment, three types of SHGV, DHGV and DGV are judged by the sEMG signal activity amplitude.The decision fusion experiment is conducted on the three types of vocabulary by using cHMM.
Wang et al. [23] used a three-axis ACC for the recognition of 8 custom gestures, achieving a recognition rate of 98.75%.Zhuang et al. [24] used signals from four forearm muscle groups and one palm muscle group to classify 18 Chinese Sign Language vocabulary words.The average recognition rate achieved 91.4%.The main work of [23,24] is on gesture recognition, neglecting the action trajectories during the formation and transformation of gestures.This has resulted in a limited vocabulary for recognition.In this study, we integrate Chinese sign language gestures with action trajectory information, utilizing the cHMM multi-sensor information fusion approach to achieve the recognition of complex Chinese sign language vocabulary.The experimental results show that the decision fusion method of cHMM is effective for SL vocabulary recognition.

Discussion
The main purpose of this article is to study gesture recognition based on the fusion of gesture information and motion trajectory information.Using the amplitude states of synchronous sEMG signal and ACC signal to determine the starting and ending points of sign language activities, continuous sign language vocabulary is segmented.Then, utilizing the independent information flow of sign language gesture and action trajectory information, as well as the inherent sequence and logical correlation, the multi-sensor information fusion method of cHMM is used to complete the recognition of sign language vocabulary.By using gesture pattern types and action trajectory types as hidden states of the cHMM chains, the output probability values of each state observation value are fused using cHMM, solving the classification problem of large vocabulary sign language recognition systems.
The vocabulary of Chinese sign language is enormous, with a total of over 5600 words.If vocabulary is identified individually or decomposed according to structural elements, it will incur a huge training burden and make the recognition system complex.Kshitij et al [1] and Heickal et al. [25] used computer vision to recognize and analyze sign language, achieving a recognition rate of 91% for 150 American sign language vocabulary.Zhou et al. [26] designed a data acquisition and recognition system for wearing on fingers using three-axis ACC, achieving an average recognition rate of 80% in 16 gesture movements.Asif et al. [2] analyzed sign language recognition based on sEMG signals and ultimately achieved a recognition rate of 95% for 11 gestures.However, the above methods only have good recognition rates for single sign language actions, and often do not have high recognition rates for continuous sign language actions.Considering the inherent order and logicality of gesture and action trajectory information in continuous sign language, this study uses the method of sEMG signal and ACC signal amplitude state to perform vocabulary segmentation, fuses sign language gesture and action trajectory to output target vocabulary, which provides convenience for continuous sign language recognition and has a positive effect on improving recognition accuracy.This study collected sEMG and ACC data of 120 Chinese sign language vocabulary from 8 volunteers.After cHMM decision fusion, the recognition rate of SHGV and DHGV vocabulary is as high as 92.22% and 90.38%, and the recognition rate of DGV vocabulary is 86.95%.This article provides a relevant foundation for the development of Chinese sign language recognition devices with high real-time, high reliability, and wearability.However this study also has many limitations in experiments and methods.At present, the method proposed in this study has only been tested on healthy individuals.In future work, we will cooperate with rehabilitation institutions for further experiments and research, and test this method on a group of deaf/mute patients receiving rehabilitation treatment.In addition, the target object of this article's research method is all Chinese sign language vocabulary.Currently, only 120 commonly used vocabulary libraries have been established and tested.Therefore, the expansion of the testing vocabulary library and the corresponding gesture and action trajectory decomposition table for the expanded vocabulary are also key challenges that need to be solved in the next step of research.The recognition rate of SHGV vocabulary is as high as 92.22%

Conclusions
SLR is an important research direction in human-computer interaction.This study deeply discusses the SLR method based on sEMG and motion information fusion, and makes a beneficial attempt on the comprehensive and systematic recognition of CSL.First, according to the internal characteristics of CSL, it is divided into 37 standardized gestures and 18 movement trajectories, and all CSL vocabulary words are classified.Then, 120 commonly used target words, involving 9 gestures and 8 movement trajectories, are listed as research and test objects.Based on the acquisition system of sEMG and motion information as well as the vocabulary segmentation method of sEMG and ACC dual-signal amplitude state, the multi-sensor information decision fusion of cHMM is used to complete the recognition of CSL vocabulary.The average accuracy is 90.41%.

•
The activity thresholds of time series sEMG and ACC signals are set to T sEMG and T ACC .When MAV k >T sEMG or MAV l >T ACC , the amplitude state is set to 1, and the signal is in the active section of the gesture action.The values of activity thresholds T sEMG and T ACC are obtained by combining the signal activity start and stop situations of 8 experimental volunteers during sign language action training.The CSL activity detection steps in (Fig 10) are shown in (Fig 11).

Fig 10 .
Fig 10.Signal activity diagram of a complete vocabulary word "clear (清)".(a) The sign language trajectory of the word "clear"(清).(b) The change of sEMG signal in one vocabulary cycle.(c) The change of ACC signal in one vocabulary cycle.https://doi.org/10.1371/journal.pone.0295398.g010

Fig 14 .
Fig 14.Decision fusion output value of "clear (清)".(a) First-stage gesture.(b)Second-stage gesture.https://doi.org/10.1371/journal.pone.0295398.g014 1 i1 ; S 2 i2 g at the previous time.• B: Probability distribution of observations, B = {b j (o t )},where b j (o t ) is the probability of observed value o t ¼ fo 1 t ; o 2 t g when the model is in state